Integration method: Linked Inference of Genomic Experimental Relationships (LIGER)(Welch et al. 2019)
Data were normalized for different numbers of UMIs per cell.
Variable genes on each dataset were selected.
Data were scaled by root-mean-square across cells.
Cells/genes with no expression across any genes/cells were removed.
Key parameters:
Number of variable genes per dataset (individual) selected for integration: 3000
Total number of variable genes used for integration (the union across all individuals): 4850
Note: The length of the union across datasets (individuals) varied. Please check the Venn and UpSet plots below and make sure there is no outlier dataset(s).
Venn diagram of selected variable genes: The large numbers indicate how many variable genes are common between datasets. The datasets are represented by the numbers in the parentheses.
Upset chart of selected variable genes: The first 6 vertical bar charts show the sizes of isolated dataset participation to the total variable genes used for integration.
An integrative non-negative matrix factorization was performed in order to identify shared and distinct metagenes (factors) across the datasets.
Corresponding factor/metagene loadings were performed for each cell.
Key parameters:
Number of Factors (inner dimension of factorization; k): 20
Penalty parameter which limits the dataset-specific component of the factorization (lambda): 5
Resolution parameter which controls the number of communities detected: 1
Data generated by PCA and LIGER were used as input for dimension reduction.
Dimension reduction was performed using tSNE, UMAP, and UMAP3D methods.
Visualisation of the batch effect using tSNE plots.
Quantification of the batch effect based on kBET(Büttner et al. 2019) test results. The rejection rate for each test represents the fraction of neighbourhoods with a label composition different from global composition of batch labels. A significantly different observed vs. expected rejection rate opposes the well-mixedness of the data.
Key parameters:
Clustering method: louvain
Number of nearest neighbors to use (k): 100
Resolution parameter that controls the resolution of clustering.: 1e-05
scFlow v0.4.2 – 2020-04-23 10:06:59
Büttner, Maren, Zhichao Miao, F. Alexander Wolf, Sarah A. Teichmann, and Fabian J. Theis. 2019. “A test metric for assessing single-cell RNA-seq batch correction.” Nature Methods 16 (1): 43–49. https://doi.org/10.1038/s41592-018-0254-1.
Welch, Joshua D., Velina Kozareva, Ashley Ferreira, Charles Vanderburg, Carly Martin, and Evan Z. Macosko. 2019. “Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity.” Cell 177 (7). Cell Press: 1873–1887.e17. https://doi.org/10.1016/j.cell.2019.05.006.
A report by scFlow